Chinese Word Segmentation Using Various Dictionaries
نویسنده
چکیده
Most of the Chinese word segmentation systems utilizes monolingual dictionary and are used for monolingual processing. For the tasks of machine translation (MT) and cross-language information retrieval (CLIR), another translation dictionary may be used to transfer the words of documents from the source languages to target languages. The inconsistencies resulting from the two types of dictionaries (segmentation dictionary and transfer dictionary) may produce some problems for MT and CLIR. This paper shows the effectiveness of the external resources (bilingual dictionary and word list) for Chinese word segmentations.
منابع مشابه
A Web-based Approach To Chinese Word Segmentation
Chinese text processing requires the detection of word boundaries. This is a non-trivial step because Chinese does not contain explicit whitespace between words. Existing word segmentation techniques make use of precompiled dictionaries and treebanks. The creation of dictionaries and treebanks is a labor-intensive process and consequently they are updated infrequently. Furthermore, due to their...
متن کاملExperiments on Unsupervised Chinese Word Segmentation and Classification
There are several problems encountered for Chinese language processing as Chinese is written without word delimiters. The difficulty in defining a word makes it even harder. This paper explores the possibility of automatically segmenting Chinese character sequences into words and classifying these words through distributional analysis in contrast with the usual approaches that depends on dictio...
متن کاملEnglish-Chinese Cross-Language IR Using Bilingual Dictionaries
This report describes the English-Chinese crosslanguage experiments at Berkeley for TREC-9 CrossLanguage Information Retrieval track. We present a simple and effective Chinese word segmentation method and compare the cross-language retrieval performance of two bilingual dictionaries for query translation.
متن کاملExploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for ChineseJapanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the system dictionary of a Chinese segmenter b...
متن کاملAutomatic Morphological Parsing of Chinese
This paper provides a basic design of an automatic morphological parser of Chinese that uses the syntactic word definition for word segmentation and tries to manage with as little resources as possible. Two possible resource bases are suggested, a dictionary of characters of Chinese with their default parts-of-speech or a small dictionary with some common words and their parts-of-speech to be u...
متن کامل